A minimum description length approach to grammar inference
نویسنده
چکیده
We describe a new abstract model for the computational learning of grammars. The model deals with a learning process in which an algorithm is given an input of a large set of training sentences that belong to some unknown grammar. The algorithm then tries to infer this grammar. Our model is based on the well-known Minimum Description Length Principle. It is quite close to, but more general than several other existing approaches. We have shown that one of these approaches (based on n-gram statistics) coincides exactly with a restricted version of our own model. We have used a restricted version of the algorithm implied by the model to nd classes of related words in natural language texts. It turns out that for this task, which can be seen as a `degenerate' case of grammar learning, our approach gives quite good results. As opposed to many other approaches, it also provides a clear`stopping criterion' indicating at what point the learning process should stop.
منابع مشابه
Evolving Stochastic Context-Free Grammars from Examples Using a Minimum Description Length Principle
This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from nite language samples. The approach employs a genetic algorithm, with a tness function derived from a minimum description length principle. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. We provide details...
متن کاملUnsupervised Grammar Inference Using the Minimum Description Length Principle
Context Free Grammars (CFGs) are widely used in programming language descriptions, natural language processing, compilers, and other areas of software engineering where there is a need for describing the syntactic structures of programs. Grammar inference (GI) is the induction of CFGs from sample programs and is a challenging problem. We describe an unsupervised GI approach which uses simplicit...
متن کاملIncrementally Inferring Context-Free Grammars for Domain-Specific Languages
Grammatical inference (or grammar inference) has been applied to various problems in areas such as computational biology, and speech and pattern recognition but its application to the programming language problem domain has been limited. We propose a new application area for grammar inference which intends to make domain-specific language development easier and finds a second application in ren...
متن کاملLearning context-free grammars to extract relations from text
In this paper we propose a novel relation extraction method, based on grammatical inference. Following a semisupervised learning approach, the text that connects named entities in an annotated corpus is used to infer a context free grammar. The grammar learning algorithm is able to infer grammars from positive examples only, controlling overgeneralisation through minimum description length. Eva...
متن کاملMaintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: Case of grammatical inference
In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper...
متن کامل